Using tensor hypercontraction density fitting to achieve an 0(L 4 ) CISD algorithm 
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Recently, Hohenstein et al[l[ introduced tensor hypercontraction density fitting to decompose the 
rank-4 electron repulsion integral tensor as the product of five rank-2 tensors. In this paper, we use 
this methodology to construct an algorithm which calculates the approximate ground state energy 
in 0{L A ) operations. We test our method using several small molecules and show that we quickly 
approach the CISD limit with a small number of auxiliary functions. 

<n: 

The problem of the rapid growth of the electronic wavefunction with system size has plagued quantum chemistry 
for decades^, There have been many attempts to conquer the 'curse of dimensionality' and a large number of 
\ highly successful approximations have been developed. The difficulty inherent in electronic structure calculations is 
that correlated methods usually scale as high orders of the number of basis functions involved in the calculation. 
A straightforward implementation of Hartree-Fock scales as 0(L ) where L is the number of basis functions. Ap- 
proximate methods employed to capture correlation usually face a trade-off between accuracy and efficiency [J, [5j. 
Perturbative treatments such as MP2 and MP4 scale as 0(L 5 ) and 0(L 6 ), respectively Q. Configuration interaction 
methods, which add in single-, double-, triple- and quadruple order excitations, also form a hierarchy of methods 
which scale as 0(L 5 ) and higher [(|. A plethora of other, highly accurate methods based on coupled-cluster theory^, 
2-particle reduced density matrices^, [![, and reduced active space diagonalization[9] also scale as at least 0(L 6 ). 
Due to this high scaling, these methods are generally not applicable beyond small molecules. Instead, quantum 
chemists have increasingly turned to density functional theory (DFT), which offers low 0{L A ) or even 0(L 3 ) formal 
O • scaling while managing to capture varying amounts of the electronic correlation [10j. Nonetheless, the search for ef- 
ficient wavefunction-based methods that are competitive with DFT has continued, motivated by the importance of 
strong correlation in many systems such as transition metal clusters, solid state devices, and molecules far from their 
C/5 ■ equilibrium geometries. 

One approach which attemps to reduce both the cost and scaling of correlated electronic structure methods is the 
decomposition of the electronic repulsion integral (ERI) tensor, which is naturally a rank-4 object, into products of 
lower-rank objects. Resolution-of-the-identity techniques [lll - flij are one subset of this approach, as are pseudospectral 
methods [IB-[l3- Our algorithm was strongly motivated by the recent work of Hohenstein et al.jj}, who developed 
a method which they named 'tensor hypercontraction density fitting' (THC-DF). The authors showed that their 
decomposition could be used to derive 0(L 4 ) MP2 and MP3 algorithms and suggested that it could also be used 
to improve the efficiency of a wide variety of electronic structure methods. Here, we apply their idea to the CISD 
\ method and show that it yields a dramatic reduction of computational cost, from 0(L 6 ) for normal CISD to 0(L 4 ) for 
■ our method. Our method will rigorously approach the traditional CISD energy as the number of auxiliary functions 
is increased. Therefore, our method will face the same challenges, such as lack of size-extensivty, as the traditional 
CISD method. However, we believe that the substantial reduction in computational scaling exhibited by our method 
more than makes up for the well-known shortcomings of CISD, particularly given the fact that numerous methods 
exist for the correction of these shortcomings fl8) . 

The electronic Hamiltonian of an atom or molecule can be written in 2nd-quantized notation as 
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where h l k is the rank-2 matrix of 1-electron integrals and ejy = (ik\jl) is the rank-4 electronic repulsion integral tensor. 
The ERI can be decomposed using a set of Ph auxiliary functions as 



Ph 

Ak 



^ y %ia%kaZab%jb%lb- (3) 



a,b=l 

This decomposition can be achieved trivially by writing e as a L 2 x L 2 matrix and diagonalizing it to obtain eigenvector 
Vq. and eigenvalues d x . We can then pick some arbitrary orthonormal 1-electron basis Xi a and decompose the 
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eigenvectors v* k into a product of rank-1 objects by solving L 2 linear equations 

L 2 

Vi k = ^ylx ia x ka . (4) 

a=l 

Letting 

L 2 

z ab = Y,y x a dX yb ( 5 ) 

we obtain the exact decomposition in Eq. The problem with this decomposition is that 1) it involves a large 
number of auxiliary basis functions Pjj = L 2 and 2) it is based on the diagonalization of the L 2 x L 2 matrix e, which 
requires 0(L e ) operations. 

One of the most important conclusions of [l| was that real molecular ERI tensors can be well-approximated to 
mH accuracy by the expression in Eq. ([3|) with a number of auxiliary functions that scale only as Pr = O(L). 
The authors also devised an algorithm to obtain the decomposition in Eq. ((3]) in 0(L A ) operations. Although this 
previous result is crucial to ensuring the low scaling of our overall algorithm, for the purposes of testing the intrinsic 
accuracy of our method, we will use the exact decomposition prescribed in Eq. ([5]) to avoid any errors inherent to 
the approximation of the ERI tensor. Another important feature of the THC-DF decomposition is that it enables the 
efficient 0(L 3 ) transformation of the ERI from the atomic orbital basis to any other arbitary one-electron basis, such 
as the orthonormal molecular orbital basis. Because of this simple transformation property, we will assume that the 
ERI is already written an orthonormal basis such as the molecular orbital basis. 

Our CISD method itself is based on a parametrization of a single-Slater determinant reference wavefunction, usually 
taken to be the Hartree-Fock ground state. We parametrize our trial wavefunction \<j>) as follows: 

\(j>) =A\V),A = A +J2 ^c\c k c\ci (6) 

ikjl 

where |\&) is our single-Slater determinant reference wavefunction and A is an arbitrary 2-body excitation operator 
which need not be expressed in the natural orbital basis of |'F). This latter condition is important for understanding 
our algorithm, since A need not excite only from occupied to virtual orbitals as is often assumed in CISD or CCSD 
derivations. The constant A is added to A to guarantee that the original reference function is contained within our 
ansatz. Because of the spin-invariance of non-relativistic molecular Hamiltonians, we can constrain our ansatz to 
preserve the number of a and /? electrons. For this reason, our actual ansatz is 

A = Aq + A aa + A a p + App. (7) 

where the operators A aa , A a p and App act on two spin-up, one spin-up and one spin-down, or two spin-down electrons, 
respectively. In what follows, we will ignore this simplification, as it needlessly complicates notation. 
Noting that A l £ is a rank-4 tensor, we apply the THC-DF decomposition from [l| and express it as 

Pa 

Afi = ^ XiaXkaZabXjbXlb (8) 
a, 6=1 

If we let Pa = L 2 , we can represent any 2-body excitation operator A and hence any CISD wavefunction \<fi). For 
smaller values of Pa, we do not span the entire CISD space but nonetheless have a very efficient and interesting repre- 
sentation of our trial wavefunction. As we will show in our examples, we are able to obtain excellent approximations 
to the CISD energy with P A < L. 

To obtain an approximate ground state energy, we substitute our trial wavefunction into the standard variational 
energy expression to obtain 

E$ = (<f>\H |$)/($|$) (9) 

= ((Mi) + (AM)) / (AA) (io) 

where expectation values are taken over the reference wavefunction | 1 F). The expressions on the right- hand- side of 
Eq. (ITU1) look formidable because they involve calculating the expectation value of 4- to 6-body operators. It is at this 



3 



stage that the THC-DF decomposition becomes crucially important because it allows us to evaluate the quantities in 
Eq. (fT0|) with only 0(L 4 ) operations. To see how, let us consider the most complicated term in Eq. (fTO)) . ^Aff^V 
which involves the evaluation of a 6-body operator. Plugging in our decompositions in Eqs. ([3]) and ([5]), we obtain 

AH 2 Aj = ^ (.XiaXkaZXjbXlb) {x mc X oc Z cd X nd X v d) {XqeXseZ e fXrfXtf) ^cj ^^^^n^P^ 8 ^) (H) 

where we sum over all repeated indices. 

Using the canonical fermionic anticommutation relations, we can express the operator in Eq. in canonical 
ordering, with all creation operators on the left and all annihilation operators on the right. Once the operators 
are canonically ordered, their expectation values over the single Slater determinant wavefunction can be written 
exactly as antisymmetrized products of the 1-particle reduced density matrix [l9l |20T | . 

x D ik = (*| cjcfc |*) . (12) 

For instance, we can write 

(4c}c l c k )= 1 Di'Di- 1 Di 1 Di. (13) 

Expressing the expectation value of an n-body operator over the wavefunction |\&) will require n\ separate terms, so 
the result will be lengthy. 

The question that immediately arises is whether these lengthy expressions can be evaluated efficiently. At this 
point, the THC-DF decompositions employed in Eqs. © and © become extremely important. Because both the 
Hamiltonian and excitation operators are decomposed in terms of rank-2 tensors, a given index can be connected to 
at most three other indices. For instance, the index a in Eq. (jlip is involved only in the terms XaiiXak and Z a \>. As 
a result, we can hold i, k and b constant and perform the sum over a, storing the result 

Xikb = ^^XaiXakZ a b (14) 
a 

for future use. When we implement all the terms in Eq. (fTTj) . we find that we can evaluate all terms in our expression 
by summing over a single index while holding at most three others constants. Provided that both Ph and Pa scale as 
O(L), this fact entails that the expected energy can be evaluated in 0(L 4 ) operations, albeit with a large prefactor. 
Furthermore, when we take the analytic gradient of our energy with respect to the variational parameters Xai and 
Zab, we find that this result also requires only 0(L 4 ) operations. An illustrative example of the terms involved in our 
calculation is provided in the Appendix. 

Our final algorithm is implemented as follows: We begin with some initial guess of the variational parameters Xia 
and Z a b. Currently, we have found that selecting a random matrix Xia and setting Z — appears to work well, 
especially when Pa is sufficiently large. For small values of Pa < 5, the minimization can become trapped in local 
minima, but sampling a small number of initial conditions guarantees that a repeatable global minimum will be found. 
We then perform a quasi-Newton minimization on the expected energy E$ in Eq. ©, seeking to minimize it with 
respect to our variational parameters Xia and Z a t using the LBFGS optimization algorithm described in [2lL [22| ■ The 
minimiation converged to within 1 mH of its final result within 300 iterations for all the molecules studied and we 
expect that miminization time can be greatly reduced by using better initial guesses and better approximations to 
the Hessian in the LBFGS algorithm. Nonetheless, since both the calculation of the energy and its gradient require 
0(L 4 ) operations, our minimization algorithm can likewise be accomplished in 0{L A ) time. Upon minimization, the 
energy we obtain will be an upper bound on the CISD energy and will approach the CISD result as the number of 
auxiliary functions Pa is increased. 

Our first set of results in Table I shows the electronic energies obtained for a selection of small molecules in a minimal 
STO-6G basis set. The RHF and CISD results and the ERIs are calculated using Gaussian0£)[23j]. The remainder 
of the table shows the energies calculated by our method as a function of the size of the auxiliary basis Pa- We see 
that our method converges to the CISD result as we increase the size of the auxiliary basis. More surprising is how 
few auxiliary functions are needed to obtain a very accurate approximation to the CISD result. For instance, almost 
95% of the CISD correlation energy in the molecule LiH could be recovered by our algorithm with only two auxiliary 
functions Pa = 2. For other molecules a larger number of auxiliary functions were required to ensure good convergence 
to the CISD answer. But in all cases, in order to converge recover 98% of the correlation energy present in the exact 
CISD result, no more than Pa = 6 auxiliary functions were required. This result is important because Pa must scale 
as O(L) to ensure that our overall algorithm retains a scaling of 0(L A ). Note that for large Pa, our algorithm recovers 
slightly more correlation energy than is recovered by the CISD algorithm due to spin-contamination in our algorithm. 
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E a (H) 


Correlation Energy (mH) 


Molecule 


Ehf 


CISD 


Pa = 2 


Pa = 4 


Pa = 6 


Pa = 10 


BH 


-27.1484 


55.7 


37.8 


55.4 


55.8 


55.9 


LiH 


-8.9182 


20.9 


20.1 


21.0 


21.0 


21.1 


BeH2 


-19.0803 


34.5 


29.7 


32.2 


34.7 


34.8 


CH2 


-44.7828 


58.1 


35.7 


55.0 


57.2 


58.2 


HF 


-103.1456 


66.7 


65.0 


66.1 


66.4 


66.7 


H20 


-84.7683 


50.7 


46.0 


47.2 


50.0 


50.5 



TABLE I: Ground state energies for a variety of small molecules in the STO-6G basis set. As the number of auxiliary functions, 
Pa, is increased, the energy obtained by our algorithm approach the exact CISD result. In all cases, our method with Pa = 6 
recovers more than 98% of the CISD correlation energy. 





E a (H) 


Correlation Energy (mH) 


Molecule 


Ehf 


CISD 


Pa = 2 


Pa = 4 


Pa = 6 


Pa = 10 


BH 


-27.2559 


59.9 


32.1 


52.8 


59.0 


59.8 


LiH 


-8.9475 


19.0 


17.5 


18.6 


19.1 


19.2 


BeH2 


-19.1161 


39.4 


16.8 


34.4 


38.8 


39.3 


CH2 


-44.8861 


84.3 


36.6 


58.2 


77.2 


81.6 


HF 


-103.6346 


144.9 


67.8 


128.1 


139.6 


143.3 


H20 


-85.0717 


130.2 


47.3 


111.0 


121.0 


125.8 



TABLE II: Ground state energies for a variety of small molecules in the 6-31G basis set. As the number of auxiliary functions, 
Pa, is increased, the energy obtained by our algorithm approach the exact CISD result. In all cases, our method with Pa = 10 
recovers more than 96% of the CISD correlation energy. 

While the CISD algorithm is spin-projected such that only singlet states are included, our algorithm allows mixing 
into states of other spin- multiplicities, yielding energies that are ever so slightly lower than the spin-pure result. In 
all cases, this effect was negligible, amounting to less than 0.3 mH. 

Table II shows the same results for the same molecules using the larger 6-31G basis. Again, we see a rapid 
convergence to the exact CISD result as a function of Pa- These results also support the inference that only Pa — 0{L) 
auxiliary functions are needed to converge to the exact result, since in all cases studied, setting Pa = L recovers > 97% 
of the exact CISD correlation energy. 

Finally, Figure 1 offers further support for our contention that our algorithm scales as 0{L A ). This plot shows 
the computational cost of calculating the energy and gradient of a system of N non-interacting H2 molecules as a 
function of N. Because the molecules are strictly non-interacting, P' H = NPh, where P' H and Pji are the number of 
auxiliary functions needed to describe the composite and single system, respectively. Similarly, we use P' A = NPa 
auxiliary functions to represent our excitation operator. As a result, this system provides the perfect test case for 
measuring the computational scaling of out algorithm. The result in Fig. 1 shows that our algorithm scales as 0(L 4 ). 
This measurement is consistent with an examination of our code. The slow step in our algorithm is a series of nested 
do-loops that sum four indices over the range 1 ... Pa or 1 . . . Pjj. Provided that Pa and Pjj both scale as O(L), our 
algorithm then has a complexity of (9(i 4 ), as we observe. 

There are numerous areas of future research. One of the most important issues is performing the efficient THC- 
DF decomposition of the Hamiltonian. Although the authors of [l[ introduced a method to decompose a molecular 
Hamiltonian in terms of Pjj = O(L) auxiliary functions using 0(L ) operations, they recognized that there were very 
large prefactors involved in this process. For this reason, it is worth investigating whether we can construct such a 
decomposition more efficiently, even if the construction process retains the same asymptotic scaling. 

A related issue is the prefactor associated with the electronic structure algorithm presented in this paper. The 

slow step of our algorithm is the calculation of the energy and gradient of the 6-body term (Ah^AS. Since this 

expectation value involves six creation and six annihilation operators, there are 6! = 720 possible matchings of the 
operators, each of which contributes a unique term to the energy and gradient. The code required to perform such 
calculations is immense; our final program contained over 500,000 lines of FORTRAN code even when effort was made 
to store intermediate results and eliminate redundancy. Major improvement could be made to our algorithm either 
by developing analytic methods to reduce the cost of energy /gradient evaluation or by developing methods which 
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FIG. 1: The computational cost of each energy /gradient calculation as a function of the number of basis function L for a system 
of N noninteracting H2 molecules in the STO-6G basis. The blue Xs show the numerical results; for comparison, the dashed 
line shows L 4 scaling. As expected, our algorithm scales asymptotically as 0(L 4 ) 

identify and approximate negligible terms in our equations. 

A final major area of research is the extension of our method to coupled-cluster algorithms. One of the key elements 
of coupled cluster theory is that the excitation operator Ti must only connect occupied orbitals to virtual orbitals; 
there can be no non-zero elements which act within the occupied or virtual subspaces or which connect virtual orbitals 
to occupied orbitals. However, if we make this assumption, it is not clear that the excitation operator T2 can still 
be efficiently written in the THC-DF form, which is vital to the efficiency of our algorithm. In other words, the two 
approximations which are crucial to an THC-DF-based coupled-cluster theory may be mutually incompatible. Future 
studies will have to assess whether this is indeed the case or whether either theory can be modified to produce an 
efficient coupled-cluster algorithm. It is also interesting that it has already been shown in a different context that 
generalized coupled-cluster methods, which do not make the occupied/unoccupied distinction, can exactly represent 
the ground state many-body wavefunction[24]. Our results provide additional motivation for investigating generalized 
coupled-cluster methods. 

In conclusion, we have described an 0(L ) electronic structure algorithm which can be tuned in its accuracy through 
the value of the parameter Pa- When this parameter is 0, we recover the Hartree-Fock result. When the parameter 
is O(L), we rapidly approach the exact CISD result. Although the percentage of correlation energy recovered by 
the CISD approach decreases to as a function of the system size due to its lack of size extensivity, the many 
known approaches to recovering size-extensivity within CISD should be applicable to our method, thereby resotring 
its utility even for large systems [lq. Our hope is that the efficiency of this algorithm will make it computationally 
competitive with existing low-scaling methods like Hartree-Fock and density functional theory, bringing explicitly 
correlated wavefunction-based methods into the realm of practical calculation even for large systems. 
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I. APPENDIX 



The total variational energy is given by 

= ((iffii) + (iff 2 i)) / (ii) (15) 

where the A operator is given by 

A = A + A aa + A aP + A pp (16) 
and where the operators Hi and H2 conserve spin such that they can be written as 

H 1 = Hf+H? (17) 
H 2 = H^ a + H^ + H^ (18) 

For our demonstration, we will consider only the contributions to the total energy that come from the term 



E = {A H?A aa ) (19) 
Using the THC-DF decomposition for the excitation operator, we obtain the expression 

E = Aq ^ hikXajXalZabXbmXbn (clck^CiC^C^ (20) 
ikabjlmn 

To evaluate the expectation value in Eq. (|2T))) . we use the canonical fermionic anticommutation relations. For conve- 
nience, we define the 1-hole reduced density matrix = 8 l k — l D % k , After some tedious algebra, we obtain 

clckC^clcn) = W^ + ^^ + W^ (21) 



+ 1 D l n 1 Ql 1 D\ - 'Dj^Di + ^QfQl 

If we plug Eq. (J2TJ) into Eq. ([2"0]). we obtain 

E = A Q Y,(^[K-D)[x-D.^] aa Zab[x-D- X \ b (22) 

ab 

+Tr[h-D] [ X -D-^] ab Z ab [ X -Q-X\ b 
+ [ X -Q-h-D- X l] aa Z ab [ X -D- X \ b 
+ W-D-x'] aa Z ab [x-Q-h-D- X \ b 
-[x-D-h-Q-x^] ab Z ab [x-D- X ^] ab 

+ [x-Q-x'] ab z ab [ x -Q-h.D. x \ b 



The operations in Eq. (|20[) consist of matrix multiplications and summations over the indices a and b. The matrices 
h, D, and Q are L x L matrices while the matrix x is a Pa X L matrix. Provided that Pa = O(L), all of the matrix 
multiplications will require 0(L 3 ) operations. Finally, the sum over indices a and b runs from 1 to Pa- As long as Pa 
scales linearly with L the sum over a and b can be performed in 0(L 2 ) operations. 
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The value of the gradient can be calculated by differentiating Eq. (|22j) with respect to the various parameters. 

dE/dAo = E/A (23) 
dE/d X ai = AoY, (Tr [h -D][D. *t] ta Z ab [ x ■ D ■ x +] bb (24) 

6 

+Tr[h-D] [ X -D] ai Z ab [x-D- X \ b 
+Tr[h-D] [ X -D. X \ b Z ba [D- X % 
+Tr[h-D] [ X -D- X \ b Z ba [ X -D] al 
+Tr [h-D] [D-x\ b Z ab [ X -Q-x\ b 
+Tr[h-D] [ X -D] bl Z ba [x-Q-X^] aa 
+Tr[h-D] [x-D- X \ a Z ba [Q ■ X % a 
+Ti-[h-D] [x-D- X \ a Z ba [ X -Q] ai 
+ [Q-h-D- X % a Z ab [x-D- X \ b 
+ [ X -Q-h-D] at Z ab [x-D- X \ b 
+ [x-Q-h-D-x\ b Z ba [D- X % 
+ [x-Q-h-D- X \ h Z ba [ X -D] al 
+ [D- X \ a Z ab [x-Q-h-D- X \ b 
+ [ X -D] ai Z ab [x-Q-h-D- X \ b 
+ [x-D. X \ b Z ba [Q.h-D- X % 
+ [x-D- X \ b Z ba [ X -Q-h-D] al 
-[D-h-Q- X % b Z ab [x-D- X '] ab 
-[ X -D-h-Q] bi Z ba [x-D- X \ a 
-[x-D-h-Q-x\ b Z ab [D- X % 
~[x-D-h-Q- X \ a Z ba [ X -D] bi 

+ [Q-A lb z ab [x-Q-h-D- x '] ab 

+ [ X -Q] bl Z ba [ X -Q-h-D-x\ a 

+ [x-Q-x'] ab z ab [Q-h-D- x % 
+ U-Q-x\ a z ba [x-Q-h-D] bi ) 

dE/dZ ab = AoJ2 (Tr [h-D][ X -D- *t] aa [ X -D- x>] bb (25) 

ab 

+Tr[h-D] [x-D- X "] ab [X-Q-X% 
+ [x-Q-h-D- X '] aa [X-D- X \ b 
+ [X-D- X '] aa [x-Q-h-D- X \ b 
-[ X -D.h.Q. X \ b [X-D- X \ b 

+ [x-Q-x^] ab [x-Q-h-D-x^], 

Notice that many of the quantities involved in the above equations are repeated. These can be calculated and stored 
for later use, to speed computation at the expense of a larger memory requirement. The expressions above provide a 
representative example of the kinds of calculations involved in our computation. 
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